Keto-CTA Study

An Analysis of the Available Data

John (The John & Calvin Podcast)

Incorrect Y-Axis Tick Labels

Figure 1B from Study

Figure 1B from Data

Individual Change in Plaque Volume

(B) The red line represents the median change (0.8%), and the shaded area represents the IQR (0.3%-1.7%).

Incorrect Shading Area

Figure 1A from Study

Figure 1A from Data

Individual Change in Plaque Volume

(A). The red line represents the median change (18.9 mm3), and the shaded area represents the IQR (9.3-47.0 mm3).

Incorrect Y-Axis Tick Labels

Figure 2F from Study

Figure 2F from Data

Changes in Total Plaque Score vs Coronary Artery Calcium

(C, F) Only CAC is associated with changes in NCPV and TPS. The regression line was fitted with the function “lm,” which regresses y~x, and the shaded area represents the standard error.

Linear Model Assumptions

4 Simple Linear Regression Assumptions

3 are tested with data

  • Linearity: between the predictor and the outcome

  • Constant variance (homoscedasticity) of residuals

  • Normally distributed residuals


These linear assumptions are quantifiable and objectively testable.

  • If the assumptions don’t hold, statistical significance and uncertainty estimates aren’t trustworthy
  • Results may be invalid

Violations

Actual Assumption Tests

Model β Linearity Constant Variance Residual Normality
ΔNCPV ~ CACbl β = 0.18
p = <0.001
Violation
p = 0.031
Violation
p = 0.001
Violation
p = <0.001
ΔNCPV ~ NCPVbl β = 0.25
p = <0.001
OK
p = 0.198
Violation
p = <0.001
Violation
p = <0.001
ΔNCPV ~ PAVbl β = 5.48
p = <0.001
Borderline
p = 0.050
Violation
p = <0.001
Violation
p = <0.001
ΔNCPV ~ TPSbl β = 7.37
p = <0.001
OK
p = 0.132
Violation
p = <0.001
Violation
p = 0.001

Objective tests show all 4 models failed at least 2 tests.

Response from Authors

Subjective Assumptions

  • Calling residual-plot evaluation “subjective” is misleading.

  • Visual checks are interpretive, but these linear assumptions are quantifiable and objectively testable.

  • Robust Linear Regression mainly just down-weights outliers.
  • Does not deal with non-linearity, heteroskedasticity and non-normality of residuals.

Conclusions vs Actual Reported Model

CONCLUSIONS In lean metabolically healthy people on KD, neither total exposure nor changes in baseline levels of ApoB and LDL-C were associated with changes in plaque.”


Abstract claim component Model Model reported
Δ-plaque vs LDL-C exposure Δ-NCPV ~ LDL-C exposure Not reported
Δ-plaque vs LDL-C baseline Δ-NCPV ~ LDL-C baseline Not reported
Δ-plaque vs ΔLDL-C Δ-NCPV ~ ΔLDL-C Not reported
Δ-plaque vs ApoB exposure Δ-NCPV ~ ApoB exposure Not reported
Δ-plaque vs ApoB baseline Δ-NCPV ~ ΔApoB Reported
Δ-plaque vs ΔApoB Δ-NCPV ~ ΔApoB Reported
N/A NCPV_final ~ LDL-C exposure Reported (NCPV_final, PAV_final)

No TPS Model Results

Results Neither change in ApoB …, baseline ApoB, nor total LDL-C exposure … were associated with the change in noncalcified plaque volume (NCPV) or TPS.”

“Neither … change in ApoB nor the ApoB level … were associated … with TPS (Figures 2D and 2E, Table 3).” - “changes in and baseline levels of ApoB were not associated with changes in NCPV or TPS”


Figures 2D–2F are Δ-TPS (outcome) panels (vs ΔApoB, ApoB, CAC_bl)


Table 3 has no Δ-TPS models

No Δ-TPS ~ LDL-C exposure models or results anywhere.

Lifetime LDL-C Exposure Calculation

“LDL-C exposure on a KD was calculated by sum- ming the products of the reported days on a KD prior to study commencement and baseline LDL-C on a KD plus the study follow-up days by their final LDL-C.”

\[ \text{LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \]


  • one baseline value for all pre-study KD time and one final value for all follow-up is a coarse simplification
  • standard AUC/time-weighted approach (need multiple measurements)
  • limitation due to resources
  • relies on recall of KD start

Lifetime LDL-C Exposure Calculation

“LDL-C exposure on a KD was calculated by summing the products of the reported days on a KD prior to study commencement and baseline LDL-C on a KD plus the study follow-up days by their final LDL-C.”

\[ \text{LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \]

“Estimated lifelong LDL-C additionally included the product of age upon commencing a KD and pre-KD LDL-C.”

\[ \text{Life-LDL-C}_{\text{exp}} = Days_{\text{KD}}\cdot LDL_{\text{baseline}} \;+\; Days_{\text{follow-up}}\cdot LDL_{\text{final}} \;+\; \boldsymbol{\big( Age_{\text{at-KD-start}}\cdot LDL_{\text{pre-KD}} \big)} \]

  • This equation is nonsensical. It is invalid.
  • It adds days and age together. It’s like adding miles and inches.
  • pre-KD term is down-scaled by ~365× relative to the day-based terms.
  • any associations with outcomes are dominated by keto diet exposure.
  • if they did convert age, “lifetime” exposure just reflects how old someone was at KD

Age “mediation” analysis

“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate (Table 3). Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model (Table 3).”

This is not a “mediation” analysis. A mediation analysis:

  • tests how an exposure affects an outcome through a middle step (the mediator)
  • estimates an indirect effect (through the mediator) and a direct effect (everything else).
  • requires a pre-specified pathway and proper statistical testing (usually with CIs or bootstraps).

This isn’t mediation analysis

  • No indirect effect was estimated or tested.
  • Age can’t be a mediator of LDL exposure (age isn’t caused by LDL); it’s a confounder.
  • They only compared p-values after adding variables.

Age “mediation” analysis

“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate (Table 3). Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model (Table 3).”

They ran (reported) three regressions in sequence.

Conclusion: after adjusting for baseline CAC, neither age nor lifetime LDL-C predicts NCPV_final.

  • CAC explains the association; age / lifetime LDL-exposure don’t matter.

Changes in p-values come from collinearity due to Age being embedded in lifetime LDL-C exposure.

Age is a confounder/proxy for exposure duration, not a mediator.


They did NOT report NCPV_final ~ lifetime LDL-C exposure model results.

“Sensitivity” Analysis

“Sensitivity analyses on participants with >80% of bHB measurements above 0.3 mmol/L (Supplemental Tables 2 to 4) and with high calculated 10-year cardiovascular risk showed similar results to those just reported (Supplemental Table 5).”

This is not a sensitivity analysis. This is a subgroup analysis.

Sensitivity analysis: demonstrate robustness of conclusions to reasonable alternative assumptions or analytic choices

Subgroup analysis: assess differences of effects across subsets of the dataset (e.g., high vs low adherence; low vs high baseline risk).

Properly, this is tested with interaction terms in the full sample, not by splitting the data, running the exact same models and eyeballing p-values.

Bayes Factor R-scale

“Bayes factors were calculated…with default settings and an ~ rscale value of 0.8 to contrast a moderately informative prior with a conservative distribution width (to allow for potential large effect sizes) due to the well-documented association between ApoB changes and coronary plaque changes.”

Calling 0.8 “moderately informative” is inaccurate.

R package docs: “medium”, “wide”, “ultrawide” = 0.354, 0.5, 0.707

  • rscaleCont = 0.8 is wider than “ultrawide” → a very diffuse prior that places substantial mass on very large effects
  • No reported prior-sensitivity to alternative r-scales
  • Same r-scale apparently used for all models, no justification.

Fixed description:

“We used a very wide prior on coefficients (rscale = 0.8, wider than the package’s”ultrawide”), which places substantial prior mass on very large effects. This diffuse prior penalizes small-to-moderate effects, requiring substantially stronger evidence than under the ‘wide’ or ‘medium’ defaults to support them.”

Univariable change-score models as primary evidence

Preprint vs Published Version

Title is now: Longitudinal Data From the KETO-CTA Study Plaque Predicts Plaque, ApoB Does Not

Find and replace ‘begets’ with ‘predicts’.

“Most participants presented with stable NCPV (Figures 1A and 1B), with 1 participant exhibiting a decrease in NCPV”

That interpretation of Figures 1A and 1B was removed and replaced with:

“The median change in NCPV was 18.9 mm3 (IQR: 9.3-47.0 mm3) and the median change in PAV was 0.8% (IQR: 0.3%-1.7%).”

Table 1 median (Q1–Q3) PAV at baseline changed from 1.25% (0.5–3.6) in the preprint to 1.6% (0.5–4.9

The two violations you keep seeing—non-normality and heteroskedasticity—are largely driven by the outcome’s distribution (ΔNCPV) and its mean–variance pattern. Swapping predictors (e.g., APOB vs CAC) usually won’t fix those. So it’s likely most univariable Δ models would show the same two problems.

  • 2 assumption violations:

The simple linear model breaks two key rules: the residuals aren’t normally distributed and their spread changes with x (heteroskedasticity).

The estimated slope (the “trend”) can still be a good average summary of how y changes with x.

But the usual p-values/confidence intervals from ordinary least squares (OLS) can’t be trusted because the standard error formula is wrong under heteroskedasticity, and non-normality hurts small-sample tests. If your OLS p-value < 0.05: treat it as suggestive, not definitive. Recompute using heteroskedasticity-robust or bootstrap methods. It may stay significant—or it may not. If your OLS p-value ≥ 0.05: you can’t conclude “no association.” The test might be too noisy or mis-calibrated. Recheck with robust/bootstrapped standard errors and report the effect size with a confidence interval.

the conventional OLS standard errors, t-tests, and CIs are invalid with heteroskedasticity; non-normality further invalidates small-sample t-inference.

  • 3 assumption violation (linearity)

In plain terms The model breaks three core assumptions: residuals aren’t normal, their spread changes with x, and the relationship isn’t actually linear. Because of the changing spread, the usual p-values/intervals are mis-calibrated (they can be too small or too big). Because the relationship isn’t linear, the reported “slope” isn’t a clear effect; it’s just a weighted average of a curved pattern. Its size—or even its sign—may not reflect the true relationship. If the reported OLS p-value < 0.05: “This suggests a non-zero average linear trend, but inference is mis-calibrated and the effect has no clear meaning under a misspecified (nonlinear) model. The ‘significance’ may be an artifact.” If the reported OLS p-value ≥ 0.05: “This is a non-result from a mis-calibrated, misspecified model. It does not justify concluding ‘no association,’ and it may mask real patterns due to the nonlinear form and changing variance.”

Interpretation of reported results: p < 0.05: Evidence only for a non-zero projected linear component under misspecification; inference is mis-sized and the estimand lacks a clear causal/functional meaning. p ≥ 0.05: Absence of evidence from a misspecified, mis-sized test; it does not speak to the presence or absence of a true association.

Assumptions of linear model

Assumptions of a linear model (and why they matter) Linearity of the mean: the average outcome changes in a straight-line way with the predictor. Why it matters: If the true pattern is curved, the slope summarizes the wrong thing and can misstate direction and size.

Independence of errors: observations don’t carry leftover information about each other (no autocorrelation/clustering). Why it matters: Dependence makes uncertainty estimates too small or too large.

Constant variance (homoskedasticity): the scatter of errors is roughly the same across the predictor. Why it matters: If the spread grows or shrinks, standard errors and p-values from the basic model are mis-calibrated.

Approximately normal errors (mainly for small samples): error terms are roughly bell-shaped. Why it matters: The usual t-tests and confidence intervals rely on this; strong departures undermine those calculations.

Exogeneity / no systematic bias: on average, errors are unrelated to the predictor (no omitted confounders correlated with x). Why it matters: Violations bias the slope itself, not just its uncertainty.

No exact collinearity (relevant in multivariable settings): predictors aren’t exact copies of each other. Why it matters: Otherwise the model can’t isolate individual effects. (Not an issue in a single-predictor model.)

The study’s univariable ‘ΔNCPV ~ APOB’ analysis is not decisive. The appropriate test is APOB’s partial association in a follow-up model that controls for baseline NCPV (and age/sex). Only H 0  ⁣ : β APOB = 0 H 0 ​
:β APOB ​
=0 in follow-up ~ baseline + APOB + covariates addresses whether APOB is associated with follow-up independent of baseline.

The reported null does not address whether APOB is associated with follow-up conditional on baseline, age, and sex, which is the clinically relevant estimand.